Learning Articles - General Programming

Lacrimae rerum. Memento mori. Memento vivere.

Basic Git Version Control

Git is a distributed version control system which tracks changes in a set of files. Its goals include speed and efficiency, data integrity, and support for non-linear workflows. It was originally created by Linux Torvalds in 2005 for the development of the Linux Kernel. As a distributed version control system, every directory using Git is a full-fledged repository, with complete history and full version-tracking abilities, which is independent of network access or a central server. These notes rely on the ideas and learnings from the Git documentation and "Pro Git", 2nd Edition, by Scott Chacon, Ben Straub, and various other contributors. Git is free and open-source software distributed under the GPLv2.0-only licence.

https://git-scm.com/docs/git https://github.com/progit/progit2

Initial Background

The main difference between a distributed version control and centralized version control is that a distributed version control locally mirrors the entire repository, including its complete history, for each client, rather than only locally checking out the latest working copy or snapshot while the rest of the repository remains on a centralized server. A unique attribute of Git is that it considers files within a repository as snapshots of a miniature file system for each version - in other words, whenever a commit is made, a snapshot of all files at that moment is efficiently referenced (along with the creation date, commit details, and commit message) and stored with pointers to the previous parents of the file through checksums (using SHA-1 hashes referencing blobs). This is in contrast to most alternative version control options which focus on storing changes made to files over time (known as delta-based version control). As a result, most internal operations in Git only need local files and resources to operate without interacting with a network.

Illustration of the difference between Git as a mini file system compared to delta-based version control:

Because of this, an attribute which makes Git stand out from alternative version control options is its branching model. This model allows and encourages clients to have multiple local branches which can be entirely independent of each other, while then allowing for simple creation, merging, deletion, and switching between the different branches. These branches can also be used based on individual roles and features as well, such as for development, experimentation, testing, and production. Moreover, it is possible to segment development locally, selectively push individual branches to the remote repository, and adapt based on the workflow of the project. In this way, each local repository is also a complete clone of the entire repository, even if only a specific branch is checked out, which allows for each client to have a full backup of the project (which can be easily restored if the main server fails or becomes corrupted).

Example of branching using lines of development for production, crucial issues, releases, exploration, and features:

Illustrations of subversion-style, integration manager, and dictator-lieutenants workflows commonly used with Git:

Relative to Git, there are 3 main states in which a file can reside. These modified for files which have only been changed without other operations (associated with a working tree), staged for files which have been changed and marked to be included in the next commit (associated with a staging area or index), and committed for files which have been committed to the local or remote repository (associated with the .git directory with the metadata and object database for the repository). Once committed, the files are seen as unmodified until they are changed in a future commit. A file may also be categorized as untracked when it is first created and if it is ignored by Git.

Illustration of the lifecycle of the status of files in a repository as they change over time:

When setting up Git, there are various configuration variables stored in a configuration file which control the operational aspects. The location of the configuration file is at project/.git/config for a project (default behaviour), ~/.gitconfig or ~/.config/git/config for the global user, or /etc/gitconfig for the system (requires administrative or superuser privilege to change). The main configuration variables include a username and email address as an identity which is immutably baked into commits. Other configuration variables which can be queried, set, replaced, or unset include the default text editor, name of the default branch upon initialization, pruning behaviour when fetching, template for commit messages, set of files to ignore, response time for auto-correction, and aliases for a custom command.

Overview of all of the arguments and options which can be used when executing a command:

~ $ git [--version] [--help] [-C <path>] [-c <name>=<value>] [--exec-path[=<path>]]
			        [--html-path] [--man-path] [--info-path] [--paginate|--no-pager] [--no-replace-objects]
			        [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
			        [--super-prefix=<path>] [--config-env=<name>=<envvar>] <command> [<args>]

Display the help information about Git or a specific command and available arguments:

~ $ git help --all

Show the current configuration of the local repository, global user setup, or system setup:

~ $ git config --list

~ $ git config --list --local

~ $ git config --list --global

~ $ git config --list --system

Set the name which is identifiable for credit when reviewing history and associated with each history marker:

~ $ git config --global user.name "First Last"

~ $ git config --global user.email "firstlast@mail.com"

Query, set, replace, or unset other configuration variables or edit the configuration file directly:

~ $ git config --global --edit

~ $ git config --global core.editor "nano -w"

~ $ git config --global init.defaultBranch "main"

~ $ git config --global pull.rebase "false"

~ $ git config --global fetch.prune true

~ $ git config --global commit.template ~/.gitmessage.txt

~ $ git config --global core.excludesfile ~/.gitignore.txt

~ $ git config --global help.autocorrect 1

Create an alias for a custom command (or shorthand) or incorporate an external command:

~/project-directory $ git config --global alias.unstage "reset HEAD --"

~/project-directory $ git config --global alias.empty "git commit --allow-empty"

~/project-directory $ git config --global alias.last "log -1 HEAD"

~/project-directory $ git config --global alias.visual "!gitk"

Repository Setup

A new local repository can be created at the current directory or another specific target directory. The creation of a repository involves the initialization of the necessary configuration files and skeleton for the repository under .git. Alternatively, it is possible to clone an existing remote repository through Hypertext Transfer Protocol or Secure Shell Protocol, where this repository will already have existing configuration files and skeleton. Once the repository has been initialized, a working copy will be automatically checked out from the initial branch and any files within the repository will be tracked by Git and versioned based on the staging and commits.

Create a new local repository in the current directory or at a specific target directory:

~ $ git init

~ $ git init ./project

Clone an existing remote repository through either Hypertext Transfer Protocol or Secure Shell Protocol:

~ $ git clone https://example.com/project.git

~ $ git clone ssh://example.com/project.git

~ $ git clone https://example.com/project.git ./project-directory

~ $ git clone https://example.com/project.git --origin nameRemote

~ $ git clone https://example.com/project.git --branch nameBranch

It is good practice to commit snapshots of changes each time the branch reaches a suitable position to be recorded. Once a suitable position is reached, the relevant files and their changes are added to the staging area and, once all modified and new files are staged, the commit is made to repository with a message describing the modifications and additions. For reference when staging files, it is possible to check the state of files which have changed since the last commit and any new files in the working tree, as well as the files which are currently added to the staging area for the next commit. The exact changes in modified files can also be compared with the differences and associated lines numbers being detailed.

It should be noted that a new file will initially be untracked when it is created - specifically any files in the working tree are untracked if they were not in previous snapshots and are not in the staging area. Any changes made to untracked files are not recorded as changes, such that it is necessary for them to be included in a commit before changes are tracked. Once a file is committed for the first time, it will be tracked and any subsequent changes will be noticed. It is also possible to set a specific file or directory, types of files, or standard glob patterns (simplified regular expressions) to be intentionally ignored from being tracked within .gitignore in the root of the repository applied recursively to the entire repository (it is also allowable to have a .gitignore in sub-directories applied relative to the path of the sub-directory). (With regard to a short status, ?? refers to new files which are not tracked, A refers to new files which have been added to the staging area, M refers to modified files which have been staged, and MM refers to modified files which have been staged).

Check the state of files which have changed since the last commit and any new files in the working tree:

~/project-directory $ git status

~/project-directory $ git status --short

Add an unstaged file or files in a directory to the staging area for the next commit (updates the index):

~/project-directory $ git add "Example.txt"

~/project-directory $ git add "./directory/example"

~/project-directory $ git add "."

Check the exact changes of files which have modified since the last commit and any new files in the working tree:

~/project-directory $ git diff

~/project-directory $ git diff --staged

~/project-directory $ git difftool

Examples of common files, directories, and glob patterns specified in .gitignore and are not tracked:

# Comment
				*.log
				[ABC].pt
				[0-9].md
				Directory/
				/TODO.txt
				!Extra/**/*.log

Once the files are staged with acceptable changes, they can be committed to create a snapshot of the repository with a SHA-1 hash as a checksum for reference. In aggregated, all the commits form the history logs of the repository and they should be documented with descriptive messages. To mention, it is good practice for the message of the commit to generally describe what the effect of the changes would be if they were to be merged into another branch. There are also available commands to prepare files for being committed directly, as opposed to using system commands from the command line and then staging the modifications afterwards (such as removing, moving, and renaming).

Create a commit as a snapshot of the repository with a message describing the changes in the modified files:

~/project-directory $ git commit

~/project-directory $ git commit --message "Description of the effect of the changes."

~/project-directory $ git commit --all --message "Description of the effect of the changes."

View the history logs of the repository with a list of commits in reverse chronological order:

~/project-directory $ git log -3

~/project-directory $ git log --patch

~/project-directory $ git log --stat

~/project-directory $ git log --pretty=oneline --graph

~/project-directory $ git log --pretty=format:"%h - %an, %ar: %s"

~/project-directory $ git log --grep="Search in commit messages."

Common commands as substitutes for system commands with the changes automatically staged:

~/project-directory $ git rm "Example.txt"

~/project-directory $ git mv "Directory 1/Example.txt" "Directory 2/Example.txt"

A commit can be amended by making and staging the changes to create a new commit which replaces the results of the previous commit, such that there will be no record of the previous commit in the repository. In most cases, it is good practice to only amend commits which are only in the local repository and have not been pushed to the remote repository (otherwise it will be necessary to force the push which can cause issues for other clients). The contents of a file can also be restored to unmodify a modified file or unstage a staged file. If there has been an error with staging or incorrect commits, the local repository can also be reset to a previous commit with changes either being kept or discard relative to the working tree and staging area. Similarly, it is possible to revert a modified file to its contents in a previous commit or in the staging area. However, it should be kept in mind that caution should always be applied when undoing changes, as it is not always possible to easily redo something which has been mistakenly undone.

Modify the most recent commit on the current branch to change the message or after staging additional:

~/project-directory $ git commit --amend --message "Description of the effect of the changes."

Restore the contents of a file in the working tree to unmodify a modified file or unstage a staged file:

~/project-directory $ git restore --source main~3  "Example.txt"

~/project-directory $ git restore --staged "Example.txt"

~/project-directory $ git restore --staged --worktree "Example.txt"

Reset the current branch and optionally contents of files in the working tree to their contents in a previous commit:

~/project-directory $ git reset "Example.txt"

~/project-directory $ git reset --merge HASH123

~/project-directory $ git reset --soft HEAD~3

~/project-directory $ git reset --mixed HEAD~3

~/project-directory $ git reset --hard nameBranch

Restore the files in the working tree to their contents in the latest commit of a specified branch:

~/project-directory $ git checkout --force nameBranch

The default remote repository will be implicitly added with the shortname as origin when the repository is initialized or cloned. Additional remote repositories can be added explicitly when working with several collaborators. To synchronize local changes on a branch with a remote repository, it is necessary to push the commits of the changes to the specified remote repository (merges local states to remote states, where fast-forward is used by default if possible). To check whether there have been changes to remote repositories, it is necessary to fetch the latest information of the remote repositories which is not yet in the local repository, along with the objects necessary to complete the histories of the updated branches. To synchronize remote changes on a branch with a local repository, it is necessary to pull the commits of the changes from the specified remote repository (merges remote states to local states, where fast-forward is used by default if possible). If the local and remote branches have diverged, it will be necessary to specify how to reconcile the divergent branches. It is also necessary to have read and write access to the remote repository.

Consider the shortnames and URLs of each remote repository which has been specified:

~/project-directory $ git remote --verbose

~/project-directory $ git remote show nameRemote

~/project-directory $ git remote add nameRemote git://example.com/repository.git

~/project-directory $ git remote rename nameRemoteOld nameRemoteNew

~/project-directory $ git remote prune nameRemote

~/project-directory $ git remote remove nameRemote

Synchronize and merge the local commits of the changes on a branch with a remote repository:

~/project-directory $ git push --all

~/project-directory $ git push nameRemote nameBranch

~/project-directory $ git push nameRemote nameBranch --force

Fetch the latest information of the remote repositories which is not yet in the local repository:

~/project-directory $ git fetch --all

~/project-directory $ git fetch nameRemote nameBranch

~/project-directory $ git fetch --multiple nameRemote nameRemote nameRemote

~/project-directory $ git fetch --prune nameRemote

~/project-directory $ git fetch nameRemote --force

Synchronize and merge the remote commits of the changes on a branch with a local repository:

~/project-directory $ git pull --all

~/project-directory $ git pull nameRemote nameBranch

~/project-directory $ git pull nameRemote nameBranch --force

~/project-directory $ git fetch --all
										~/project-directory $ git merge nameRemote/nameBranch

Tagging provides the ability to mark specific points in the history of a repository - typically for version release points. There is support for lightweight and annotated tags. A lightweight tag is a pointer to a specific commit and meant for private or temporary object labels (in a sense, this can be thought of as a branch which does not changes). An annotated tag is stored as an object with a checksum, creation date, tagger details, tagging message, and optional signature for verification. A tag can also be set retrospectively to a specific commit with its checksum. To note, as with branches, it is also necessary to push the tags to the specified remote repository (merges local states to remote states).

List the existing tags and view a specific tag:

~/project-directory $ git tag --list --verbose "*"

~/project-directory $ git show nameTag

Create a lightweight tag (pointer to a specific commit):

~/project-directory $ git tag nameTag

Create an annotated tag with a message (checksum, creation date, and tagger details also stored):

~/project-directory $ git tag --annotate nameTag --message "This is a new version for release."

~/project-directory $ git tag --annotate nameTag --message "This is a signed version." --signed

~/project-directory $ git tag --annotate nameTag HASH123

Delete an existing tag (or push a null value to the remote repository after deleting):

~/project-directory $ git tag --delete nameTag
										~/project-directory $ git push nameRemote --delete nameTag

~/project-directory $ git tag --delete nameTag
										~/project-directory $ git push nameRemote :refs/tags/nameTag

Synchronize and merge the local tags (or only annotated tags) with a remote repository:

~/project-directory $ git push nameRemote nameTag

~/project-directory $ git push nameRemote --tags

~/project-directory $ git push nameRemote --tags --follow-tags

Branching And Merging

Although alluded to, branching involves creating an alternate and independent line which diverges from the main line of development. This functionality is intuitive in Git, as it is exceptionally lightweight for managing and switching between different branches. This is possible due to the nature in which snapshots are used to store commits and changes to files through blobs (content-addressable filesystem as a key-value data store for the versions of the files), trees (collection of checksums for the content matching the generated blobs with their respective paths and allowing for the re-creation of a file at any point), and index (list of the resources needed to create the full tree of directories and files which are used to form the next commit). So, a branch is simply a movable pointer to a specific commit (using SHA-1 hashes). The default branch name is master (although this is often renamed to main in newer projects).

Example of the association between commits and their respective trees and blobs:

Example of creating alternate and independent lines which diverge from the main line of development:

A new branch can be created which is essentially associated with a new movable pointer to a specific commit. It is possible to switch to a different branch by checking it out to be the current branch. To distinguish the current branch, there is a special reference as HEAD pointing to the local branch which is currently checked out (in a detached state, the HEAD does not point to any branch, but instead points to a specific commit or remote repository). It should be emphasized that the files of the project will always reflect their state as modified from the latest commit of the current branch. For organization, it can be convenient to name branches with a prefix in the form of prefix/shortname for a hierarchical scheme, where the prefix refers to the type, topic, developer, or team for which the branch is intended to be used.

List the existing branches and view a specific branch:

~/project-directory $ git branch --list --verbose "*"

~/project-directory $ git show nameBranch

Create a new branch (movable pointer to a specific commit):

~/project-directory $ git branch nameBranch

~/project-directory $ git branch nameBranch startPoint

Rename an existing branch and maintain the associated history:

~/project-directory $ git branch --move nameBranchOld nameBranchNew
										~/project-directory $ git push --set-upstream nameRemote nameBranchNew
										~/project-directory $ git push nameRemote --delete shownameOld

Delete an existing branch (or push a null value to the remote repository after deleting):

~/project-directory $ git branch --delete nameBranch
										~/project-directory $ git push nameRemote --delete nameBranch

~/project-directory $ git branch --delete nameBranch
										~/project-directory $ git push nameRemote :nameBranch

Switch to a different branch by checking it out to be the current branch or creating it if it does not exist:

~/project-directory $ git checkout nameBranch

~/project-directory $ git checkout -b nameBranch

~/project-directory $ git checkout -b nameBranch startPoint

~/project-directory $ git switch nameBranch

~/project-directory $ git switch --create nameBranch

~/project-directory $ git switch --create nameBranch startPoint

Create a local branch which tracks a remote branch or change the remote branch which is tracked:

~/project-directory $ git checkout --track nameRemote/nameBranch

~/project-directory $ git checkout --set-upstream-to nameRemote/nameBranch

Merging considers the current branch and another branch forming an independent line of development and integrate the alternate branch into the current branch. The sequence of commits will be considered from the point at which their histories diverged and these will be combined into a unified history. A fast-forward is possible when commits are merged into a branch which can be reached by linearly following the history of the commits, such that it is possible to simply move the pointer of the branch forward, because there are no divergent changes to take into account (such that it is not necessary to make a commit). If a fast-forward is not possible (divergent changes within the branches), a 3-way merge (or true merge) will be performed using 2 snapshots at the tips of the branches and common ancestor of the branches. During the process, a 3-way merge will create a merge commit and is special in that it has more than 1 parent.

Merge changes from a source branch into the current branch from the point at which their histories diverged:

~/project-directory $ git merge nameRemote/nameBranch

~/project-directory $ git merge -m "Merge into 'main'. Add details." nameRemote/nameBranch

~/project-directory $ git merge --edit nameRemote/nameBranch

~/project-directory $ git merge --no-commit nameRemote/nameBranch

Create a merge commit even if a fast-forward merge may be possible:

~/project-directory $ git merge --no-ff nameRemote/nameBranch

Illustration of a merge between 2 branches as a fast-forward (simply move the pointer of the branch forward):

Illustration of a merge between 2 branches as a 3-way merge (tips and common ancestor of the branches):

With a 3-way merge, the common ancestor is used as a base and serves as a reference upon which more complex logic can be performed. The logic is tasked with using algorithms to determine whether the separate versions of files differ in ways which are irreconcilable. If they cannot be reconciled due to different changes to the same parts of the file, a merge conflict or multiple merge conflicts are logged, which will prevent the merge from being completed until the issue has been solved (pause after staging but before the merge is committed). A merge conflict can be resolved through manual intervention in a similar method of modifying, staging, and committing files, as the merge conflict will be marked with <<<<<<<< (current branch), =======, and >>>>>>>> (source branch) in the respective file. This involves performing modifications to resolve the merge conflict (remove markers and choose the content of either the current branch, source branch, or something else), staging the changes as part of the commit, and then completing the merge commit. Because of the possibility of merge conflicts, starting a merge with non-trivial uncommitted changes should be discouraged.

View merge conflicts resulting from a merge commit and typical workflow for resolving the merge conflict:

~/project-directory $ git merge nameRemote/nameBranch
										~/project-directory $ git status
										~/project-directory $ git mergetool
										~/project-directory $ git merge --continue

Abort a merge commit after merge conflicts have been realized and reconstruct the original state:

~/project-directory $ git merge --abort

The strategy or method for the logic used in the merge can also be specified. The default option is ort (Ostensibly Recursive's Twin), while other options include recursive, resolve, octopus, ours, and subtree. As an alternate to a 3-way merge, it is possible to combine a rebase with a fast-forward merge. A rebase involves taking the changes which were introduced and committed on the divergent branch and re-applying them on top of the current branch. This operation works by going to the common ancestor of the 2 branches, getting the changes introduced by each commit of the current branch, saving those changes to temporary files, resetting the current branch to the same commit as the source branch, and then applying each change in turn from the temporary files. There is no difference in the final result of the integration between following a merge or rebase, but rebasing usually leads to a cleaner history which looks linear - if examining the log of a rebased branch, it appears as if the commits happened in series, even though they may have originally happened in parallel (commonly used in projects with many contributors, such that integration is simple for maintainers). The alternative argument is that the history should be an accurate record and should not be changed.

With a more complex structure, it is possible to transplant a sub-branch which has been split from a parent branch into another branch with a distance common ancestor, such that the result can be viewed as pretending that the sub-branch was originally split from the other branch. The strategy or method for the logic used in the rebase can also be specified. The default option is ort (Ostensibly Recursive's Twin), while other options include recursive, resolve, octopus, ours, and subtree. An important consideration to emphasize is that a branch should not be rebased if it is used by other collaborators, as a rebase will result in the associated commits being abandoned and re-applied which may misalign with work done by collaborators (especially when commands are forced).

Rebase changes from a source branch into the current branch from the point at which their histories diverged:

~/project-directory $ git checkout nameBranchFeature
										~/project-directory $ git rebase nameBranchMain
										~/project-directory $ git switch nameBranchMain
										~/project-directory $ git merge nameBranchFeature

Rebase changes from a source branch when the common ancestor was originally split from another branch:

~/project-directory $ git rebase --onto nameBranchMain nameBranchParent nameBranchCurrent

View merge conflicts resulting from a rebase and typical workflow for resolving the merge conflict:

~/project-directory $ git rebase nameBranchMain
										~/project-directory $ git status
										~/project-directory $ git mergetool
										~/project-directory $ git rebase --continue

Abort a merge commit resulting from a rebase and reconstruct the original state:

~/project-directory $ git rebase --abort

Illustration of a rebase which can then be followed by a fast-forward merge to update other branches:

Illustration of a rebase with a sub-branch which has been split from a parent branch and main line of development:

Graphical User Interfaces

There are several graphical user interfaces which act as clients to allow for intuitive interaction with a repository. The most popular clients include Sourcetree (although it is proprietary and only available on Windows and Mac), Git Extensions (although it is only available on Windows), Gitnuro, GitFiend, MeGit, Gittyup, GitQlient, Kommit, Gitg, and Giggle. The basic functionality and operations will be available in most clients (such as fetching, pushing, tagging, branching, and merging), although it is not possible to map all of the commands and every option. The most convenient use may be as a viewer, where it is simple to see past commits and complete history of the project, and for choosing which files are to be staged for the next commit, where changes or partial changes can be selectively picked.

https://github.com/JetpackDuba/Gitnuro https://github.com/eclipsesource/megit https://github.com/Murmele/Gittyup https://github.com/francescmm/GitQlient https://invent.kde.org/sdk/kommit https://github.com/GNOME/gitg

Screenshots of the graphical user interface of Gitnuro with various example repositories: